AITopics | big data pipeline

Collaborating Authors

big data pipeline

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Artificial Intelligence for Cost-Aware Resource Prediction in Big Data Pipelines

Goyal, Harshit

arXiv.org Artificial IntelligenceOct-8-2025

Efficient resource allocation is a key challenge in modern cloud computing. Over-provisioning leads to unnecessary costs, while under-provisioning risks performance degradation and SLA violations. This work presents an artificial intelligence approach to predict resource utilization in big data pipelines using Random Forest regression. We preprocess the Google Borg cluster traces to clean, transform, and extract relevant features (CPU, memory, usage distributions). The model achieves high predictive accuracy (R Square = 0.99, MAE = 0.0048, RMSE = 0.137), capturing non-linear relationships between workload characteristics and resource utilization. Error analysis reveals impressive performance on small-to-medium jobs, with higher variance in rare large-scale jobs. These results demonstrate the potential of AI-driven prediction for cost-aware autoscaling in cloud environments, reducing unnecessary provisioning while safeguarding service quality.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.05127

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Data Science > Data Mining > Big Data (0.62)

Add feedback

High-throughput Cotton Phenotyping Big Data Pipeline Lambda Architecture Computer Vision Deep Neural Networks

Issac, Amanda, Ebrahimi, Alireza, Velni, Javad Mohammadpour, Rains, Glen

arXiv.org Artificial IntelligenceMay-9-2023

In this study, we propose a big data pipeline for cotton bloom detection using a Lambda architecture, which enables real-time and batch processing of data. Our proposed approach leverages Azure resources such as Data Factory, Event Grids, Rest APIs, and Databricks. This work is the first to develop and demonstrate the implementation of such a pipeline for plant phenotyping through Azure's cloud computing service. The proposed pipeline consists of data preprocessing, object detection using a YOLOv5 neural network model trained through Azure AutoML, and visualization of object detection bounding boxes on output images. The trained model achieves a mean Average Precision (mAP) score of 0.96, demonstrating its high performance for cotton bloom classification. We evaluate our Lambda architecture pipeline using 9000 images yielding an optimized runtime of 34 minutes. The results illustrate the scalability of the proposed pipeline as a solution for deep learning object detection, with the potential for further expansion through additional Azure processing cores. This work advances the scientific research field by providing a new method for cotton bloom detection on a large dataset and demonstrates the potential of utilizing cloud computing resources, specifically Azure, for efficient and accurate big data processing in precision agriculture.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2305.05423

Country:

North America > United States > Georgia > Tift County > Tifton (0.28)
North America > United States > Georgia > Clarke County > Athens (0.14)
North America > United States > Texas > Lubbock County > Lubbock (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Services (1.00)
Food & Agriculture > Agriculture (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

How AIOps Conquers Performance Gaps on Big Data Pipelines - The New Stack

#artificialintelligenceFeb-28-2020, 08:07:07 GMT

If your data pipelines are growing in complexity and beyond the point where you can manage them, you're not alone. Today, they have become so massive and are crisscrossed by so many dependencies that it can be hard to see how all the components fit together, and hard to identify issues and opportunities that impact app performance and availability. Data stacks combine many disparate elements for data gathering and analysis, among other functions -- and exponential data growth in most organizations only adds to the challenge. In such an environment, simply monitoring performance and taking reactive measures when performance lags is no longer a viable approach. Today, with AIOps (Artificial Intelligence for IT Operations), a correlated data model helps you discover the full context of your apps and system resources so that you can adequately plan, manage, and improve performance.

aiop, aiop conquer performance gap, big data pipeline, (10 more...)

#artificialintelligence

Industry: Information Technology > Services (0.30)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.56)

Add feedback

Adding Stanford CoreNLP To Big Data Pipelines (Apache NiFi 1.1/HDF 2.1) Part 1 of 2 - Hortonworks

@machinelearnbotSep-19-2017, 01:55:14 GMT

The latest version of Stanford CoreNLP includes a server that you can run and access via REST API. CoreNLP adds a lot of features, but the one most interesting to me is Sentiment Analysis. This is big, it has models and all the JARS and server code. Giving the JVM Four Gigs of RAM to run makes it run nice. Port 9000 works for me.

artificial intelligence, data mining, natural language, (13 more...)

@machinelearnbot

Country: North America > United States > California > Santa Clara County > Palo Alto (0.07)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.90)
Information Technology > Data Science > Data Mining > Big Data (0.85)

Add feedback